Statistics of cycles in large networks 
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We present a Markov Chain Monte Carlo method for sampling cycle length in large graphs. Cycles 
are treated as microstates of a system with many degrees of freedom. Cycle length corresponds to 
energy such that the length histogram is obtained as the density of states from Metropolis sampling. 
In many growing networks, mean cycle length increases algebraically with system size. The cycle 
exponent a is characteristic of the local growth rules and not determined by the degree exponent 7. 
For example, a — 0.76(4) for the Internet at the Autonomous Systems level. 

PACS numbers: 89.75.Hc,02.70.Uu,89.20.Hh 



Physics research into graphs and networks has begun 
to provide a common framework for the analysis of com- 
plex systems in diverse areas including the Internet, bio- 
chemistry of living cells, ecosystems, social communities 
0,013 The graph representation of these systems as 
discrete units coupled by links (nodes and edges) exhibits 
a large set of scaling phenomena including fractal dimen- 
sion ]3| and hierarchy of modules Q . 

A fundamental observation is the scale-free nature of 
many networks [||. The fraction of nodes with a given 
number of connections, called degree k, decays as a power 
law, P(k) ~ fc -7 for large k. For typical exponents 7 < 
3, the highly inhomogeneous density of connections can 
give rise to efficient information transfer Q and enhanced 
failure tolerance @- 

Beside the degree distribution and node-node dis- 
tances, the presence of cycles is a relevant property of 
networks. A cycle is a closed, not self-intersecting path. 
Initially, mainly cycles of the minimal length h = 3 were 
considered since high abundance of triangles is taken 
as a sign of a clustered structure Q. Longer cycles 
gained attention recently. Approximations for the sys- 
tem size scaling of the number c(h) of cycles of length h 
have been derived for various types of artificial networks 
US El El El El- It has been speculated E3 that for 
generic networks the distribution c(h) becomes sharply 
peaked in the limit of large networks, N — ► 00. For the 
position of the peak, an algebraic growth has been con- 
jectured (h) ~ N a with an exponent a < 1 as the leading 
characteristic |l5| . 

Verification of these fundamental conjectures, validity 
checks of the analytical approximations, and comparisons 
with real-world networks have been difficult so far, since 
an efficient method for finding the cycle length distribu- 
tion of a given network has been lacking. Direct enu- 
meration of all cycles is feasible only for small networks 
because the number of cycles increases exponentially with 
the number of nodes in most cases. Approximation by 
efficient sampling appears the only possibility to numer- 
ically investigate the cycle structure in the general case. 
Taking a step in this direction, Rozenfeld and co-authors 
have introduced a stochastic search for cycles EH as self- 



avoiding random walks on the network. Although the 
method allows for a quick scan of cycles on small net- 
works, larger systems cannot be treated as the probabil- 
ity of finding a given cycle is strongly suppressed with 
growing cycle length. Therefore we suggest an alterna- 
tive method that does not involve random walks on the 
network. 





FIG. 1: (a) Summation of two cycles resulting in a new cycle. 
Edges contained in either addend are contained in the sum. 
Edges present in both addends (dashed lines) cancel out. (b) 
Example of a sum of two cycles that is not a cycle itself. 



We approximate the cycle length distribution by a 
Monte Carlo algorithm that considers cycles as discrete 
microstates of a physical system. Elementary transi- 
tions between cycles, the analogues of single spin flips 
in a spin system, are defined as addition or removal of 
short detours with minimal change to cycle length. By 
considering cycle length as energy generic Monte Carlo 
procedures from statistical mechanics become applicable. 
Temperature is defined in the usual way and allows to 
tune the sampling on preferably long or short cycles. Af- 
ter introducing the algorithm in detail, we test its accu- 
racy for a set of networks where the cycle length distri- 
bution is directly accessible for comparison. We apply 
the algorithm to models of growing networks and find 
the growth exponent of the mean cycle length. Finally, 
we test scaling of the number of cycles in the growing 
Internet. 
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The formulation of the algorithm uses the following 
basic notions of cycle space. We treat a subgraph X as 
the set of edges it contains. If X is a cycle, the cardinality 
\X\ is the cycle length. The sum of two subgraphs X and 
Y is defined as X ®Y = (XUY)\(XnY), i.e. an edge is 
contained in the sum if it is in one of the addends but not 
in both. The sum X © Y of two cycles X and Y is again 
a cycle if X and Y intersect in a suitable way, see Fig. ^ 
We generate a Markov chain of cycles (Co, Ci, C2, . . .) as 
follows. The initial condition is the empty graph Co = 
at t = 0. At each step a cycle S" is drawn at random 
from a set M of initially known cycles (the choice of M 
is described below) . If the proposal C = Ct @S is a cycle 
or the empty graph, it is accepted with probability 



P, 



accept 



i{exp[-/3(|C'|-|Ct|)],l} • 



(1) 



In case of acceptance we set Ct+i = C, otherwise 
Ct+i — C This is the Metropolis update scheme 
with inverse temperature (3 and energy as cycle length. 
Subgraphs that are not cycles are treated as states with 
infinite energy E = 00 if (3 > (or E = — 00 if (3 < 0, 
respectively), such that they are always rejected. 

Throughout this paper, we take M as the set of short 
(isometric) cycles of the given graph. A cycle S is short 
if for all vertices x and y on S, a shortest path between x 
and y lies also in S. As a non-short cycle has at least one 
short-cut between two of its vertices, it can be decom- 
posed into two shorter cycles that overlap on the short- 
cut. Typically for each non-short cycle C one finds cycles 
S and C such that S is short and |C'| < C|. Applying 
the decomposition recursively, one sees that every cycle 
C occurs in a sequence 0, Ci, C2, . . . with C © C+i G M 
and |C < |Cj+i|. Thus taking as the possible "moves" 
M the set of short cycles not only ensures that every 
cycle can be reached (ergodicity). In this case, the re- 
sulting energy landscape does not have any local min- 
ima other than the unique global minimum, which is the 
empty graph at E = 0. There are exceptional graphs 
where the decomposability does not hold for one partic- 
ular cycle. The exceptions appear to be irrelevant for 
the applications here as our numiercal results remain un- 
changed when M is expanded to include more and longer 
(non-short) cycles. 

Let us first test the algorithm on a set of networks 
where exact computation of c(h) is feasible. The pseudo- 
fractal scale-free web by Dorogovtsev and Mendes 01 
grows deterministically by iterative triangle formation as 
follows. Start at generation n = with two vertices con- 
nected by an edge. To obtain generation n + 1, for each 
edge xy present in generation n add a new vertex z and 
the edges xz and yz, such that each existing edge xy 
becomes part of an additional triangle xyz. The calcula- 
tion of c{h) is particularly simple because each cycle has 
a unique predecessor in the previous generation, given 
by following direct links xy instead of the additional "de- 
tours" via z. A cycle of length h in generation n produces 
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FIG. 2: Number c(h) of cycles of length h estimated by the 
MC sampling algorithm (thick dashed curves) and the ex- 
act values from iterating Eq. ||2J (thin solid curves). Stud- 
ied networks are generations n = 4, . . . , 8 (system sizes 
N = 42, 123, 366, 1095, 3283 vertices) from the deterministic 
growth model ^j- Given a network, a histogram is generated 
for each inverse temperatures (3 £ [—5.0, . . . , +3.0] in steps of 
A/3 = 0.1. Each histogram is based on the lengths of the last 
10 s cycles of a Markov chain of total length 2 x 10 s . Then his- 
tograms are merged by choosing relative normalization such 
that the sum of squares of deviations in the overlapping region 
of adjacent histograms are minimized. The normalization of 
the final histogram is chosen such that c(0) = 1. Results are 
robust against variation of the chain length. 



2 h cycles in generation n+1 as the result of h binary de- 
cisions to follow the detour or the original direct edge. 
The histogram of cycle lengths iterates as 
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for I > 4 and c (n+1) (3) = c (n) (3) + 3". The result of the 
numerical iteration of these equations up to generation 
n = 8 is shown in Fig. [3 together with the results from 
the Monte Carlo method. The relative deviation of the 
sampling estimate of (h) from the exact value is below 
25% for all cycle lengths h and all generations n. In 
particular, the unique cycle of maximum length /i max = 
3 x 2" is detected. The method approximates the true 
numbers of cycles with large precision. 

Now we apply the algorithm to study the system size 
dependence of the cycle length distribution of stochas- 
tically growing artificial networks. All networks initiate 
as two vertices coupled by an edge. The networks grow 
by iterative attachment of vertices until a desired size N 
is reached. At each iteration, one new vertex z and two 
new edges xz and yz are generated. We are interested in 
the influence different attachment mechanisms have on 
the cycle length distribution. Therefore we distinguish 
four probabilistic rules for selection of the nodes x and 
y to which the new node z attaches. Independent ho- 
mogeneous (IH) attachment: Draw x and y randomly 
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FIG. 3: System size dependence of the cycle length distribu- 
tion in growing networks, (a) Mean cycle length for the four 
stochastic attachment rules (rj,0 5 V, A) and the determin- 
istic attachment rule (<()). For the Internet (*), system size 
N has been rescaled by factor 20 to fall into the displayed 
range. Dashed straight lines indicate growth exponents 1 and 
In 2/ In 3 w 0.63 for comparison, (b) Relative variance of the 
cycle length distribution for the same networks (same symbols 
as in (a)). In both panels, data points the stochastic growth 
models are averages over 10 network realizations each. Error 
bars indicate standard deviation over realizations. 

(with equal probabilities) and independently from the 
set of nodes; if x = y, discard this choice and repeat. 
Independent preferential (IP) attachment: Draw an edge 
randomly (all edges having equal probability) and take 
as x one of the end vertices chosen with equal probabil- 
ity; draw another edge to find y analogously; if x = y, 
discard this choice and repeat. Triangle forming preferen- 
tial (TP) attachment: Draw an edge randomly and take 
its two end vertices as x and y. Triangle forming ho- 
mogeneous (TH) attachment: Draw an edge randomly, 
take x and y as its end vertices and accept this choice 
with probability l/(deg(a;) deg(y)); otherwise reject and 
repeat. 

Rule IP is equivalent to choosing nodes with probabil- 
ity proportional to degree 0, so-called preferential at- 
tachment. It generates scale-free networks with degree 
exponent 7 = 3. Rule TP implements preferential at- 
tachment with the additional constraint that x and y 



TABLE I: Networks with different attachment rules and the 
resulting scaling exponents 7 for the tail of the degree distri- 
bution and a for the growth of the cycle lengths. The last 
column displays the symbol used in Fig. |3] 



rule 


indep / tri 


horn / pref 


a 


7 




IH [6] 


independent 


homogeneous 


1.010(4) 


00 


□ 


IP [6] 


independent 


preferential 


0.969(5) 


3 


O 


TH 


triangle 


homogeneous 


0.722(5) 


00 


V 


TP [Ig] 


triangle 


preferential 


0.644(9) 


3 


A 


PF [17] 


triangle 


preferential 


0.635(1) 


2.59 





Internet 


0.76(4) 


2.22(1) 


* 



be connected; it is the stochastic version of the pseudo- 
fractal (PF) scale-free web 01 defined above. The result- 
ing networks are scale-free with 7 = 3. The homogeneous 
attachment rule (IH) Q leads to networks with exponen- 
tially decaying degree distribution (7 = 00). The fourth 
rule (TH) introduced here combines triangle formation 
with homogeneous attachment by explicitly canceling out 
the degree dependence in the selection probability. We 
have checked that this rule generates an exponential de- 
gree distribution. 

As shown in Fig. 01a) the mean cycle length increases 
algebraically with system size, 

(h) ~ N a , (3) 

with the exponent a G [0, 1] depending on the attach- 
ment rule. The variance of the cycle length distribution 
increases algebraically with the same exponent a. There- 
fore the ratio between variance and mean is practically 
constant, see Fig. Elb) . Considering the degree expo- 
nent 7 and the cycle growth exponent a for each type of 
network (Table P) , several observations are worth men- 
tioning. Homogeneous attachment with triangle forma- 
tion leads to a non-trivial cycle growth exponent a ~ 0.72 
even in the absence of scaling in the degree distribution 
7 = 00. Networks grown stochastically with triangle for- 
mation and preferential attachment (rule TP) have the 
same exponent a ~ 0.64 as the deterministic counter- 
part (rule PF) while the degree exponents under these 
two rules are clearly different. Analogously, in the ab- 
sence of triangle formation (rules IH and IP) the same 
cycle growth exponent a w 1.0 is obtained regardless of 
the degree exponents 7 S {3, 00}. 

Finally we consider cycles in an evolving real-world 
network. The Internet at the level of Autonomous Sys- 
tems is a growin g sc ale-free network with degree expo- 
nent 7 = 2.22(1) |20l l2l| . Here we analyze snapshots of 
the network with sizes from N — 3015 nodes (Novem- 
ber 1998) to N = 10515 nodes (March 2001) We 
find that during this time the mean cycle length grows 
from 264.9 to 757.8, as plotted in Fig. Ha). As in the 
artificial growing networks, the growth is algebraic. The 
growth exponent is estimated as a = 0.76(4) by a least 
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FIG. 4: Evolution of cycles in the growing Internet at 
the Autonomous Systems level. (a) The number of cy- 
cles of given length h as a function of system size N for 
h = 10, 20, 30, . . . , 100 (squares, bottom to top). The straight 
lines are best fits of the form c(h,N) oc N* w . (b) Growth 
exponents as defined in Eq. @ obtained as slopes of the 
fitted lines in (a). Error bars of exponents indicate standard 
error from the fit. Dashed lines have slopes 1.0 and 0.9. 

squares fit. More detailed analysis is performed on the 
number c(h, N) of cycles of given length h at system size 
N plotted in Fig.|lja). We observe a scaling 

c(h, N) ~ jV^M . (4) 

with an exponent that depends linearly on h with a 
slope close to unity. Figure E£b) shows that 

((h) a h . (5) 

for not too small lengths h > 10. The scaling behavior 
is in qualitative agreement with the prediction from the 
first order approximation by Bianconi et al. [2^ . assum- 
ing that the Internet is a random network with a given 
scale-free degree distribution. 

In summary, we have introduced a method for sam- 
pling cycles in large graphs. We have identified cycle 
space with the state space of a system with many de- 
grees of freedom, thereby making Monte Carlo techniques 
from statistical mechanics applicable. In this framework, 
we have analyzed the evolution of cycles in growing net- 
works. While the mean cycle length grows with a char- 
acteristic exponent a the relative width of the length 
distribution tends to zero as the system size increases. 
Thus, in agreement with an earlier speculation |l5|. the 
exponent a is found to be the most relevant quantity for 
the evolution of cycle space. In the scale-free model by 
Barabasi and Albert [6| as well as the growth model with 
random homogeneous attachment, cycles are space-filling 
(a = 1.0), i.e. cycle length is proportional to system size. 



In model networks with explicit formation of triangles 
and in the Internet, however, cycles grow slower than the 
system as a whole. This class of networks having a < 1 
also includes single-scale networks with 7 = 00. Our 
study suggests that the cycle growth exponent may serve 
as a characterization of growing networks independent 
of the degree exponent 7. An open question concerns 
universality. Can a be altered continuously by tuning 
parameters or docs it assume distinct values, separating 
growing networks into universality classes? 
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